9 research outputs found
Machine Learning for Microcontroller-Class Hardware -- A Review
The advancements in machine learning opened a new opportunity to bring
intelligence to the low-end Internet-of-Things nodes such as microcontrollers.
Conventional machine learning deployment has high memory and compute footprint
hindering their direct deployment on ultra resource-constrained
microcontrollers. This paper highlights the unique requirements of enabling
onboard machine learning for microcontroller class devices. Researchers use a
specialized model development workflow for resource-limited applications to
ensure the compute and latency budget is within the device limits while still
maintaining the desired performance. We characterize a closed-loop widely
applicable workflow of machine learning model development for microcontroller
class devices and show that several classes of applications adopt a specific
instance of it. We present both qualitative and numerical insights into
different stages of model development by showcasing several use cases. Finally,
we identify the open research challenges and unsolved questions demanding
careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa
Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras
Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use
multiple stages where object detection and localization are performed
separately from the control of the PTZ mechanisms. These approaches require
manual labels and suffer from performance bottlenecks due to error propagation
across the multi-stage flow of information. The large size of object detection
neural networks also makes prior solutions infeasible for real-time deployment
in resource-constrained devices. We present an end-to-end deep reinforcement
learning (RL) solution called Eagle to train a neural network policy that
directly takes images as input to control the PTZ camera. Training
reinforcement learning is cumbersome in the real world due to labeling effort,
runtime environment stochasticity, and fragile experimental setups. We
introduce a photo-realistic simulation framework for training and evaluation of
PTZ camera control policies. Eagle achieves superior camera control performance
by maintaining the object of interest close to the center of captured images at
high resolution and has up to 17% more tracking duration than the
state-of-the-art. Eagle policies are lightweight (90x fewer parameters than
Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS)
and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for
resource-constrained environments. With domain randomization, Eagle policies
trained in our simulator can be transferred directly to real-world scenarios.Comment: 20 pages, IoTD
Recommended from our members
Learning-enabled Cyber-Physical Systems: Challenges and Strategies
Cyber-physical systems (CPS) are increasingly adopting learning-enabled components having deep neural networks in their decision-making pipelines. Deep neural networks show the promise to simplify the CPS pipelines for high-dimensional sensors as they require little pre-processing of data and are shown to be more accurate than their traditional counterparts. However, integrating neural networks into the sense-infer-actuate pipeline of CPS faces several challenges. In this dissertation, we study the following challenges in the context of learning-enabled CPS and propose new algorithms and system design strategies to address them. First, we study the challenge of characterizing uncertainty in sensor data timestamps and its impact on multimodal fusion applications. Motivated by smartphones' integration in several CPS applications, we quantify the data timestamp uncertainty across modern smartphone devices. To our surprise, we find drastic timestamping errors ranging up to multiple seconds in Android devices. Then, we explore if these timing errors are significant enough to impact the neural network's performance. Our evaluation shows that the observed timing errors can cripple the deep neural networks doing multimodal fusion due to data misalignments. Our finding signifies the need to rethink the shared notion of time on smartphones. To mitigate timestamp errors, we introduce approaches to improve time across smartphones having up to 200 microseconds of timing accuracy. We also propose a novel time-shift data augmentation technique to train time-resilient neural networks robust to the inevitability of timing errors and, as such, degrade gracefully in the face of timing errors.As a second challenge, we explore the impact of variable delays on the emerging deep reinforcement learning (RL) controllers, which are preferred due to their capability to handle high-dimensional data. Conventional controllers can model and account for delay variations in their design. However, handling variable delays in deep-RL is challenging as a black-box neural network represents the controller policy. Researchers currently use domain randomization and worst-case delay modeling to train deep-RL policies on a spread of expected delay variations. We demonstrate a significant performance degradation in applications even when using the state-of-the-art domain randomization approach. To address this, we propose Time-in-State RL, a delay-aware deep RL approach that augments the agent's state with temporal properties (sampling interval and execution latency). Time-in-State RL trains policies that show superior performance by adapting to the variable timing characteristics at runtime. We further show the superior performance of Time-in-State to the worst-case delay controllers when worst-case delays are significant. We demonstrate the efficacy of Time-in-State RL on HalfCheetah, Ant, and car in simulation and on a real scaled car robot.Thirdly, we study the challenge of modeling the CPS environment to train end-to-end controllers using deep-RL for closed-loop systems.We specifically consider the example of autonomous pan-tilt-zoom (PTZ) controllers. Existing autonomous PTZ controllers have multiple stages: detecting objects of interest, short-term tracking, and control of pan, tilt and zoom parameters to keep objects in the field of view. The multiple stages suffer from performance bottlenecks as it is difficult to optimize each step. Further, these multiple stages are computationally intensive to be realized in real-time on embedded camera platforms. Despite these shortcomings, developers adopt existing multi-stage solutions due to the lack of simulators needed to develop end-to-end controller policies.
We propose Eagle, an end-to-end deep-RL approach using raw images to control a PTZ camera. To enable successful training of Eagle, we also introduce EagleSim, a simulation framework to study PTZ cameras in photo-realistic virtual worlds. Our evaluation across a suite of PTZ tracking scenarios shows that Eagle outperforms current multi-stage approaches by providing superior tracking performance. Further, we also show that Eagle policies are transferable to real-scene videos and are lightweight to enable real-time deployment on Raspberry PI and Jetson Nano class devices.Finally, we study the challenge of developing machine learning classifiers having optimal accuracy within the desired resource budget of CPS applications. Selecting an optimal classifier is becoming increasingly complex, with many choices for classifiers and their rich hyperparameter parameter spaces. Although several hyperparameter tuning frameworks exist, their practical adoption is hindered due to inferior search algorithms, inflexible architecture, software dependencies, or closed source nature. As a solution, we propose designing a lightweight library with a flexible architecture and state-of-the-art parallel optimization algorithms. We present Mango, a parallel hyperparameter tuning library, to realize the proposed design. Mango is currently used in production at Arm for more than 30 months and is available open-source. We evaluate Mango on several benchmarks to highlight its superior performance.
We discuss production use cases of Mango in an AutoML framework and commercial CPU design pipeline. We also showcase another advantage of Mango in enabling hardware-aware neural architecture search to transfer deep neural networks to TinyML platforms (microcontroller class devices) used by CPS/IoT applications
Recommended from our members
Auritus
Smart ear-worn devices (called earables) are being equipped with various onboard sensors and algorithms, transforming earphones from simple audio transducers to multi-modal interfaces making rich inferences about human motion and vital signals. However, developing sensory applications using earables is currently quite cumbersome with several barriers in the way. First, time-series data from earable sensors incorporate information about physical phenomena in complex settings, requiring machine-learning (ML) models learned from large-scale labeled data. This is challenging in the context of earables because large-scale open-source datasets are missing. Secondly, the small size and compute constraints of earable devices make on-device integration of many existing algorithms for tasks such as human activity and head-pose estimation difficult. To address these challenges, we introduce Auritus an extendable and open-source optimization toolkit designed to enhance and replicate earable applications. Auritus serves two primary functions. Firstly, Auritus handles data collection, pre-processing, and labeling tasks for creating customized earable datasets using graphical tools. The system includes an open-source dataset with 2.43 million inertial samples related to head and full-body movements, consisting of 34 head poses and 9 activities from 45 volunteers. Secondly, Auritus provides a tightly-integrated hardware-in-the-loop (HIL) optimizer and TinyML interface to develop lightweight and real-time machine-learning (ML) models for activity detection and filters for head-pose tracking. To validate the utlity of Auritus, we showcase three sample applications, namely fall detection, spatial audio rendering, and augmented reality (AR) interfacing. Auritus recognizes activities with 91% leave 1-out test accuracy (98% test accuracy) using real-time models as small as 6-13 kB. Our models are 98-740× smaller and 3-6% more accurate over the state-of-the-art. We also estimate head pose with absolute errors as low as 5 degrees using 20kB filters, achieving up to 1.6× precision improvement over existing techniques. We make the entire system open-source so that researchers and developers can contribute to any layer of the system or rapidly prototype their applications using our dataset and algorithms